Macrocyclization of linear molecules by deep learning to facilitate macrocyclic drug candidates discovery

Interest in macrocycles as potential therapeutic agents has increased rapidly. Macrocyclization of bioactive acyclic molecules provides a potential avenue to yield novel chemical scaffolds, which can contribute to the improvement of the biological activity and physicochemical properties of these molecules. In this study, we propose a computational macrocyclization method based on Transformer architecture (which we name Macformer). Leveraging deep learning, Macformer explores the vast chemical space of macrocyclic analogues of a given acyclic molecule by adding diverse linkers compatible with the acyclic molecule. Macformer can efficiently learn the implicit relationships between acyclic and macrocyclic structures represented as SMILES strings and generate plenty of macrocycles with chemical diversity and structural novelty. In data augmentation scenarios using both internal ChEMBL and external ZINC test datasets, Macformer display excellent performance and generalisability. We showcase the utility of Macformer when combined with molecular docking simulations and wet lab based experimental validation, by applying it to the prospective design of macrocyclic JAK2 inhibitors.

1. The prospective case study on developing JAK2 inhibitor doesn't support the claimed advantages of adapting macrocyclic compounds. Authors claimed that "Comparing with the linear analogues, macrocycles tend to possess pre-organized restricted conformations and extended contacts with targets, thus potentially achieving improved binding affinities, better selectivities and superior pharmacological characteristics." In the enzyme inhibitory assay and the cellular antiproliferative assay as shown in table 3, newly designed cyclic compounds 1-3 do not exhibit the improvement to the control of Fedratinib. While in the following of kinase selectivity test and the in vivo activity test, the control of Fedratinib is simply missing. Thus, from the presented data, there is no evidence of "improved binding affinities, better selectivities, and superior pharmacological characteristics".
1. Difficulty to understand computational methods and concepts behind: I found it hard to understand the computational methods described. The concepts are not well described and are rather cryptic. There are many examples of studies that describe complex computational methods but manage to describe the concepts in an understandable way (e.g. the recent Cell paper by the Baker lab which I read and enjoyed as it clearly describes the concept(s) of the new approach). Without making the concepts/principles understandable for the medchem community, this work is of little value, and for reviewers like me, it is not possible to evaluate the concepts behind the software tool.

Taking advantage of structural information:
It appears to me that the software tool does not take advantage of the structural information to propose linker structures suited to cyclize acyclic ligands. In particular for the applied example, there is an X-ray structure of JAK2 with Fedratinib bound, which can be taken to measure the distance between the two points that need to be connected, as well as their orientation. A computational tool that does not use this information will never be superior to a conventional approach where structure information is taken into account and potential linkers are modeled.

Comparison with conventional methods:
The authors do not side-by-side compare their tool with conventional methods. Reading the literature, I found that several groups and companies have already macrocyclized acyclic ligands of JAK kinases, and a comparison would be required. For the reasons described in point 2, it is essentially impossible that the developed tool can be better than conventional methods that take advantage of available structure information. 4. The "identified" linker in compound 3 is already reported in literature: The linker of the best one of the three compounds tested is already reported in literature. It is present in the approved drug Pacritinib, which is a macrocyclic JAK inhibitor (!). The authors do not mention this inhibitor in their report and neither describe relevant efforts to macrocyclize acyclic JAK inhibitors by other methods. An audience seeing the results of this work would immediately think that the authors have been cheating and simply have stolen the linker from the approved drug. They would not believe that it was truly identified by the new software tool. Personally, I think that presenting the data as done herein, without mentioning that this linker was found in other macrocyclic ligands of JAK targets (and even in an approached drug), is not honest and poor scientific practice.
Reviewer #4 (Remarks to the Author): The mechanisms used for drug design and selection look interesting. The colitis part in this manuscript is weak, however. First, I think that Jak2 is certainly not considered as a good target for gut inflammation in humans given the issues on myelotoxicity and thus studies in humans currently focus on Jak1 and to a lesser extent on Tyk2. Second the mouse model used is a model of acute intestinal injury and does not mimick chronic inflammation in humans. Finally, the authors showed prevention studies but did not demonstrate the therapeutic value of their compound.

To Reviewer #1 (Remarks to the Author):
The enzymatic assays appear to be carried out appropriately and highlight similar levels of activity to Fedratinib, particularly for compound 3 Response: We appreciate the reviewer's helpful comments on our work. Accordingly, we have revised our manuscript fully taking into account the reviewer's concerns. 2. The proliferation effects are also similar to Fedratinib. The rationale for the use of the colitis model is not explained and some additional information is required.

Response:
The rationale for the use of the colitis model has been explained in the revised manuscript. Overexpression of JAK2 has been reported in patients with inflammatory bowel disease (IBD), which means that JAK2 may become a therapeutic target for IBD. To assess the therapeutic effects of compound 3 and Fedratinib, we established the dextran sulfate sodium (DSS)-induced colitis model. The DSS colitis model recapitulates many clinical and pathological features of human IBD, such as bloody stools, weight loss, diarrhea, and inflammatory cells infiltration, and is widely used in IBD research as a preclinical model.

3.
It would be best to include a graph with the histology score and to improve the readability of the scale on the histology images.
Response: According to the reviewer's kind suggestion, the histological scores based on Ameho criteria has been added ( Fig. 7d) in the revised manuscript.

Is there a comparison of the activity with Fedratinib?
Response: We redid the in vivo efficacy evaluation test and added a fedratinib control group. The added pharmacokinetic study displayed that Fedratinib has shorter half-life (T1/2, 4.70 vs 10.07 h) and lower systemic exposure (AUCinf, 50.19 vs 114.69 h*ng/mL) after oral dosing. So Fedratinib was administered at twice (10 mg/kg) the dose of compound 3 (5 mg/kg). The in vivo test showed that both compound 3 and Fedratinib is are able to ameliorate the symptoms in DSS-induced murine colitis, but compound 3 (5 mg/kg) showed comparable therapeutic efficacy to Fedratinib (10 mg/kg) at a lower dose.

5.
The rationale for the method of administration of the compound should also be explained.
Response: Compound 3 and Feratinib was given to the mice by oral gavage in our study.
The oral gavage is a routine administration method that was also used in the previously repored in vivo test on mice of Fedratinib 1 . Because of the low solubility, the two compounds were formulated in the solubilizing vehicle (5% DMSO/30% PEG400/65% saline). The solvent was also given intragastrically as a control group, which showed safety to the mice in the test.

What was the sample dissolved in for the in vitro and in vivo experiments?
Response: The solvent information for the in vitro and in vivo experiments has been added in the revised manuscript. Stock solutions of test compounds were made in dimethylsulfoxide (DMSO), and subsequently diluted in different solvent for different experiments. The compounds were diluted in kinase buffer (50 mM HEPES pH 7.5, 10 mM MgCl2, 1 mM EGTA, 0.01% Brij-35) of Z'-LYTETM kinase assay kit for enzyme assay. In cell proliferation, Western Blot, and kinome selectivity assay, the compounds were diluted in RPMI-1640 medium. In in vivo pharmacokinetic and efficacy evaluation test, test compounds were prepared in the solubilizing vehicle (5% DMSO/30% PEG400/65% saline).

Was the stability of the different compounds assessed and compared to
Fedratinib? This is generally a critical point in macrocyclic studies.

Response:
We first evaluated the stability of compound 2, compound 3, and Fedratinib in DMSO stock solution. Test compounds were incubated at room temperature and 60 ℃, respectively, for 72 h, and their purities (%) were determined using HPLC equipped with a CORTECS C18 column (4.6*50 mm, 2.7 μm particle size) and a UV/VIS detector setting of λ=210 nm. All compounds were eluted with the two solvent systems (ammonium formate as organic phase in Method I and CH3OH as organic phase in Method II) at a flow rate of 0.3 ml/min. Each experiment repeated for three times and the results are represented as mean±SD. As illustrated in Meanwhile, the in vivo pharmacokinetic properties of compound 3 and Fedratinib in mice following intravenous (iv, 5 mg/kg) and oral (po, 5 mg/kg) administration were investigated. As shown in Table 4 of the revised manuscript, compound 3 displayed overall superior PK properties than Fedratinib. After oral dosing, compound 3 showed following intravenous (iv, 5 mg/kg) and oral (po, 5 mg/kg) administration were also investigated. The details can be found in the revised manuscript, and the results are summarized as follows. 3) In vivo activity Both compound 3 and Fedratinib is are able to ameliorate the symptoms in DSS-induced murine colitis, but compound 3 (5 mg/kg) showed comparable therapeutic efficacy to Fedratinib (10 mg/kg) at a lower dose.

1) Kinase selectivity
Overall, the improved selectivity and superior pharmacological characteristics of compound 3 have been demonstrated in the revised manuscript. Generally speaking, some but not all the properties in term of binding affinities, selectivities, and pharmacological characteristics will be improved after macrocyclization. Therefore, the description has been modified to a more precise statement in the revised manuscript as "thus potentially achieving improved binding affinities, better selectivities or superior

Response:
We thank the reviewer's kind advice. The sentence has been modified to a more precise statement as "… macrocycles are regarded as a privileged chemotype for targeting some challenging proteins …" in the revised manuscript. Additionally, we added some specific challenging proteins that can be targeted by macrocycles as examples "For example, macrocycles predominate the marketed inhibitors of hepatitis C virus (HCV) NS3/4A, the shallow and solvent-exposed groove of which is difficult to harbor small molecules. The advantages of macrocycles have also been reported in modulating protein-protein interactions (PPIs) with large flat and dynamic surfaces".

In the section of Model overview. 'the number of macro ring is inferior to 1'.
What is the definition here of a macro ring? Is it by counting the number of heavy atoms?
Response: Consistent with the definition of a macrocycle, a macro ring is the ring structure containing 12 or more atoms. We have added the statement in the Model overview section of the revised manuscript. In fact, although the atom indexes of macrocycles were re-ordered according to that of the acyclic substructure by entering the list we pre-arranged, RDKit would alter atom order on-the-fly to prevent strange combinations. This ensures that more reasonable SMILES are generated on the basis of aligning the input and output strings as much as possible.

Models
The number of random SMILES strings for each macrocyclic compound varies widely. If enumerating all possibilities, it will cause a serious imbalance in the data set.
After referring to the work of Moret et al. 3 , the augmentation is defined with 2X, 5X, and 10X. As shown in our manuscript, the 5-fold augmentation has shown good performance in terms of all metrics, and 10-fold data augmentations didn't result in further significant improvement.

As authors pointed out that data augmentation helps in learning the basic syntax
of the chemical language to produce chemically meaningful SMILES strings. Response: We thank the review for pointing out the inappropriate statement, which has been deleted in the revised manuscript.
8. In Fig.2, t-SNE plots was reported for generated and raw ChEMBL compounds.
Adding generated ZINC macrocyclic compounds can be interesting to compare as those compounds were never seen during the training. Response: We thank the reviewers for pointing out the shortcomings of our study, which is of great help to improve the quality of our paper. Accordingly, we have revised our manuscript fully taking into account the reviewer's concerns. Response: According to the reviewer's critical comments, we have modified our manuscript. While exhibiting the necessary experimental details of the computational method to help researchers replicate our results, we added some descriptions about the concept of behind our computational method and why we chose this method. We hope the revised manuscript will help more researchers in related fields understand the new method, and as pointed out by the review, it is indeed crucial for exhibiting the value of our method.

Taking advantage of structural information:
It appears to me that the software tool does not take advantage of the structural information to propose linker structures suited to cyclize acyclic ligands. In particular for the applied example, there is an X-ray structure of JAK2 with Fedratinib bound, which can be taken to measure the distance between the two points that need to be connected, as well as their orientation. A computational tool that does not use this information will never be superior to a conventional approach where structure information is taken into account and potential linkers are modeled. In the revised manuscript, we compared our deep learning method with the traditional method. The issue will be discussed in detail in the next question.

Comparison with conventional methods:
The authors do not side-by-side compare their tool with conventional methods.  In summary, we think the above results have demonstrate the applicability of our method in identifying novel potent lead compounds that the traditional method might have missed. It is expected that, as a new method that construct macrocycles from a different view and a powerful complement to the traditional macrocyclization method, Macformer will play a valuable role in the design of macrocyclic drug candidates.

The "identified" linker in compound 3 is already reported in literature:
The Response: We thank the reviewer for the positive assessment on our strategy used for macrocyclic drug candidate design. Regarding the three issues of the colitis model that the reviewer is concerned about, we will elaborate one by one in the following sections. In summary, continued efforts are needed to clarify the potential therapeutic efficacy of JAK2 inhibitors in IBD. We hope that our preliminary exploration can play a role in attracting new ideas.

Second the mouse model used is a model of acute intestinal injury and does not mimick chronic inflammation in humans.
Response: Human IBD is a chronic, relapsing inflammatory disorder of the gastrointestinal tract, its etiology and pathogenesis are complicated and still uncertain.
Due to its rapidity, simplicity, and reproducibility, the dextran sulfate sodium (DSS)induced colitis model is widely used in IBD research as a preclinical model. Although there are differences between the acute DSS colitis model in our study and human IBD, they share many clinical and pathological features, such as bloody stools, weight loss, diarrhea, and inflammatory cells infiltration. The acute colitis model has been used for studying the pathogenesis of IBD as well as evaluating the efficacy of many drug candidates including those involved in JAK/STAT signaling 11-13 .
3. Finally, the authors showed prevention studies but did not demonstrate the therapeutic value of their compound.
Response: As described in the In vivo Efficacy Study section of the Methods part of the manuscript, Male BALB/c mice were given 3.5% DSS water daily for 7 days to induce colitis, and the tested compounds were administrated from day 8. As shown in

REVIEWER COMMENTS</B>
Reviewer #1 (Remarks to the Author): The authors have addressed my concerns.
Reviewer #2 (Remarks to the Author): Authors have successfully addressed questions and comments the reviewer had. The reviewer considers the revision to be complete and thorough.
Reviewer #3 (Remarks to the Author): The authors have discussed extensively the questions raised and they have added text/made larger changes, but they have not addressed my concerns that I repeat below. I think that this work does not have the required relevance and quality to be published, neither in Nature Communications, not in a more specialized journal.
1. Conceptual problem with approach and poor description/rationalization of concepts: It remains still unclear to me why the proposed strategy/workflow should be able to predict optimal linkers that turn linear ligands into good macrocyclic ligands. The predictions are made based on a tool that is trained with a large number of random macrocycles binding random targets. The structure of the target is not entered into the equation. I thus think that the "macformer" tool cannot propose a linker that leads to optimal binding to a specific target (because the tool does not know the target!). If the authors think that this is nevertheless possible, they should demonstrate this with a compelling example (which they did not).

Prediction without structural information:
By all means, I cannot understand how a linker can be proposed if the structural context is not taken into account. An optimal linker depends absolutely on the structure of the target protein, as the trajectory it needs must to not clash with the protein surface and at the same time should not be too distant so that it can pick up contacts with the protein to increase binding affinity.

Comparison to conventional methods:
In the revised manuscript, the authors describe an effort that they have made. However, the description is hard to follow. The parts that I can understand are not at all convincing to me.
4. The "identified" linker was already reported: As described before, the described linker was found before. The authors declare this now. However, a reader would still think that the authors have simply been cheating by taking a linker that proofed good before. If the "macformer" tool is truly working (and I have doubts that it does), the authors would need to apply it to other targets and come up with linkers that were not reported for the same target. And they should show that the developed macrocycles have truly better properties like binding affinity or specificity.
Reviewer #4 (Remarks to the Author): To Reviewer #4 (Remarks to the Author): The mechanisms used for drug design and selection look interesting. The colitis part in this manuscript is weak, however. First, I think that Jak2 is certainly not considered as a good target for gut inflammation in humans given the issues on myelotoxicity and thus studies in humans currently focus on Jak1 and to a lesser extent on Tyk2. Second the mouse model used is a model of acute intestinal injury and does not mimick chronic inflammation in humans. Finally, the authors showed prevention studies but did not demonstrate the therapeutic value of their compound.
Response: We thank the reviewer for the positive assessment on our strategy used for macrocyclic drug candidate design. Regarding the three issues of the colitis model that the reviewer is concerned about, we will elaborate one by one in the following sections.
1. First, I think that Jak2 is certainly not considered as a good target for gut inflammation in humans given the issues on myelotoxicity and thus studies in humans currently focus on Jak1 and to a lesser extent on Tyk2.
Response In summary, continued efforts are needed to clarify the potential therapeutic efficacy of JAK2 inhibitors in IBD. We hope that our preliminary exploration can play a role in attracting new ideas.
REPLY TO AUTHORS: Sorry but this is not convincing. Simply citing some papers on JAK2 expression does not implicate that this is a good target. Nobody in the IBD field would consider JAK2 as a good target (in contrast to JAK1). The authors have not addressed this point. 3. Finally, the authors showed prevention studies but did not demonstrate the therapeutic value of their compound.
Response: As described in the In vivo Efficacy Study section of the Methods part of the manuscript, Male BALB/c mice were given 3.5% DSS water daily for 7 days to induce colitis, and the tested compounds were administrated from day 8. As shown in

To Reviewer #3 (Remarks to the Author):
The authors have discussed extensively the questions raised and they have added text/made larger changes, but they have not addressed my concerns that I repeat below. I think that this work does not have the required relevance and quality to be published, neither in Nature Communications, not in a more specialized journal.

Response:
We politely disagree with the reviewer's comments on our work. We think that the differences in areas of expertise and background knowledge prevent the reviewer from fully understanding our work. Accordingly, to better illustrate our work and address the reviewer's concerns, we would like to discuss some conceptual issues with respect to the macrocyclization task first.
The first three questions of the reviewer mainly focus on the absent of target information in Macformer when inferring the linkers and the comparison with conventional methods. So we would like to start by reviewing the reported macrocyclization work.
The reported successful rational design of macrocycles against a specific target are commonly summarized as "structure-based". Here, the "structure" actually contains two aspects, the structure of the acyclic/macrocyclic compound and the structure of the target. This process can usually be divided into two steps, the addition of macrocyclic to the absence of explicit targets for many bioactive macrocycles, the target information was not involved in Macformer. Therefore, in the JAK2 macrocyclic inhibitor design, the macrocycles generated by Macformer were docked into the ATP binding site of JAK2 to further evaluate their interactions with the target, which were used as an import criteria for subsequent compound selection.
In the revised manuscript, we have clarified the function of Macformer in the design paradigm of the macrocyclic drug candidates. Meanwhile, the reliance of Macformer on methods that assess the macrocycle-target interactions, such as molecular docking, is highlighted in the revised manuscript.
1. Conceptual problem with approach and poor description/rationalization of concepts: It remains still unclear to me why the proposed strategy/workflow should be able to predict optimal linkers that turn linear ligands into good macrocyclic ligands. The predictions are made based on a tool that is trained with a large number of random macrocycles binding random targets. The structure of the target is not entered into the equation. I thus think that the "macformer" tool cannot propose a linker that leads to optimal binding to a specific target (because the tool does not know the target!). If the authors think that this is nevertheless possible, they should demonstrate this with a compelling example (which they did not).
Response: As pointed out by the reviewer, the target information was not involved in Macformer. Given an acyclic molecule, the purposes of Macformer are to generate diverse and novel macrocyclic analogues before further evaluation of binding potential against the target of interest. Accordingly, we cannot guarantee that all the linkers proposed by Macformer could lead to optimal binding to a specific target. In the prospective design of novel macrocyclic JAK2 inhibitors, we used molecular docking to evaluate the interactions with the target and filter the macrocycles generated by Macformer. In fact, as illustrated in Fig. S7

Comparison to conventional methods:
In the revised manuscript, the authors describe an effort that they have made.
However, the description is hard to follow. The parts that I can understand are not at all convincing to me.

Response:
To the best of our knowledge, the majority of the reported works associated with rational design of macrocyclic drug candidates are based on the experts' experience and knowledge of medicinal chemistry. In these works, it is usually a lengthy presentation of the discovery of a specific macrocyclic drug candidate, without involving comparisons with other methods. Hence, it is a thorny issue to define what the conventional methods are.
According to the discussion we put forward at the beginning of the response letter, the function of Macformer in macrocyclic drug candidates design is to explore the vast chemical space of the macrocyclic analogues of a given acyclic molecule in the initial stage by adding linkers compatible with the acyclic molecule. And the generated macrocycles need to be filtered and validated through other molecular simulation methods, such as docking. We think it is reasonable to compare Macformer with methods that have the same functions. In the initial stage of macrocyclization, the selection of linkers are mainly driven by the empirical knowledge of the medicinal chemists, where the final linkers are commonly presented directly after a short description of the purpose while the detailed process is absent. This artificial and nonstandardized procedure could not be a baseline for comparison.
Instead, we used the linker database searching computational method as the baseline, which applies geometric criteria, e.g., distance and angle compatibility between the atoms to be connected, to form initial macrocycles based on the three-dimensional (3D) structure of an acyclic ligand (termed as MacLS method in our manuscript). In fact, we think the computational method is consistent to the reviewer's suggestion in the second question of the first round of review, that is "there is an X-ray structure of JAK2 with Fedratinib bound, which can be taken to measure the distance between the two points that need to be connected, as well as their orientation." It should be emphasized that the structure of the target is also not considered in MacLS for a fair comparison. In the revised manuscript, we modified the description about the MacLS method for better understanding. Additionally, the Fig. S2 in the supplementary information provides a visual representation to facilitate understanding.

4.
The "identified" linker was already reported: As described before, the described linker was found before. The authors declare this now. However, a reader would still think that the authors have simply been cheating by taking a linker that proofed good before. If the "macformer" tool is truly working (and I have doubts that it does), the authors would need to apply it to other targets and come up with linkers that were not reported for the same target. And they should show that the developed macrocycles have truly better properties like binding affinity or specificity.

Response:
We would like to address the concern of the reviewer from three aspects.
1) The source SMILES file that was used as the input for generation of macrocyclic analogues of Fedratinib has been upload in GitHub (https://github.com/yydiao1025/Macformer/blob/main/data/src_fedratinib.txt). With the source code of Macformer and the pre-trained models, it is not a difficult task to verify if Macformer could actually generate the three macrocycles we have synthesized.
2) As pointed out by the reviewer in first round of review "there is an X-ray structure of JAK2 with Fedratinib bound" (PDB code 6VNE), but if we "measure the distance between the two points that need to be connected, as well as their orientation", the linker of compound 3 would not be considered as an appropriate macrocyclic linker for Fedratinib.
Because of the structural difference between Pacritinib and Fedratinib, the additional -NH-group makes the two terminal phenyl groups of Fedratinib close to each other.
The close distance between the two macrocyclization connection points, atoms labeled as a2 and a3 in Fig. R1a Herein, an attachment vector is the bond between the atom at the cyclization site and the leaving atom that will not contained in the generated macrocycles. As shown in Fig.   R1a, the attachment vectors for Fedratinib is the bond between atoms and , and and , respectively. When a macrocycle is formed, the leaving atoms and in Fedratinib and atoms and in the linker will not be contained in the macrocycle. Ideally, the differences between distances (| − | and | − |) and dihedral angles (| − |) should be 0. The bigger difference values means the worse compatibility between Fedratinib and the linker. As shown in Fig. R1b and Table R1, among the 86 conformations, none of them satisfies the distance and dihedral angle cutoff (0.5 Å for distance and 20° for the dihedral angle used in MacLS). To sum up, if we only contain the compatibility between the attachment vectors of Fedratinib and the linkers, which follows the conventional structure-based macrocyclization concept that tries to keep the conformation of the starting acyclic molecule, the linker of compound 3 would not be considered as an appropriate macrocyclic linker for Fedratinib. On the contrary, Macformer abandoned the strict limits on 3D structures of the ligand and deduced the linker of compound 3 that was validated to exhibit improved kinome selectivity and PK properties than Fedratinib.